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in the cluster. The cross- talk was distinguishable from true 
noise because in the case of noise, only a single DXPR sequence 
had low similarity to some other cluster. Based on these data, 
the NADP-binding domain of E. coli DXPR was predicted to contain 
a Rossmann fold. 

REMARKS 

Pages 1 through 2 are submitted herewith containing 
Sequences 1 through 2, formatted in accordance with the 
conventions set forth by Patent In. Also enclosed is a copy of 
the sequence listing in computer readable form. The newly 
submitted pages and amendments to the fifth paragraph on page 4 
merely identify the sequences originally set forth in the 
application and do not add new matter. The other amendments have 
been made to the specification to correct obvious errors and do 
not add new matter. Accordingly, entry of these amendments and 
pages is respectfully requested. 

A marked up copy of the amended paragraphs of the 
specification is attached hereto as Appendix A. Language to be 
added is indicated by underlining while language to be deleted is 
bracketed. 

Applicants also submit herewith a Statement Under 37 
C.F.R. § 1.821(f) and (g) . 

Applicants also submit herewith a supplemental 
bibliographic data sheet pursuant to 37 C.F.R. § 1.76(c). The 
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supplemental bibliographic data sheet changes the identification 
of the citizenship country for Hugo O. Villar to Argentina in 
accordance with the executed Declaration for Patent Application 
that is being filed concurrently and addressed to Commissioner 
for Patents, Washington, D.C. 20231, Attention: Box Missing 
Parts. The supplemental bibliographic data sheet is also 
modified to change the order of the inventors in accordance with 
the application cover sheet as filed on December 21, 2001, to add 
the correspondence customer number, and to change the 
representative information in accordance with the executed power 
of attorney which is being filed concurrently and addressed to 
Commissioner for Patents, Washington, D.C. 20231, Attention: 
Box Missing Parts. 
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CONCLUSION 



In light of the Amendments and Remarks herein. 
Applicants submit that the claims are now in condition for 
allowance and respectfully request a notice to this effect. 
Examiner is invited to call the undersigned agent or Cathryn 
Campbell should there be any questions. 



Respectfully submitted. 
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APPENDIX A 



A marked up copy of the fifth paragraph on page 4 
(lines 21-22) follows: 



Figure 4 shows a multiple alignment of E. Coli 

DXPR (SEO ID N0:1) to S. aureas homoserine dehydrogenase (lEBF_Aj 
SEP ID i^n-7 \ . 



A marked up copy of the second paragraph on page 11 
(lines 14-29) follows: 



As used herein, the term "1 igand" refers to a 
molecule that can specifically bind to a polypeptide. Specific 
binding, as it is used herein, refers to binding that is 
detectable over non-specific interactions by quantifiable assays 
well known in the art such as those that measure association 
rates, dissociation rates or equilibrium association or ' 
dissociation constants. A ligand can be essentially any type of 
natural or synthetic molecule including, for example, a 
polypeptide, nucleic acid, carbohydrate, lipid, amino acid, 
nucleotide or any organic derived compound. The term also 
encompasses a cofactor or a substrate of a polypeptide having 
enzymatic activity, or substrate that is inert to catalytic 
conversion by the bound polypeptide. Specific binding to a 
polypeptide can be due to covalent or non-cov;,1 [non covalent] 
interactions . 
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A marked up copy of the paragraph spanning pages 17 and 
18 (page 17, line 22, through page 18, line 11) follows: 

The methods can also be used with a set of amino acid 
sequences that are preselected for a particular structural or 
functional characteristic. A preselected range of structural or 
functional characteristics for a set of polypeptides used in the 
methods can include, for example, binding to a particular ligand, 
interacting with a particular biological component such as 
another protein, common enzymatic function, common structural 
motifs or folds, common subcellular localization or co-expression 
due to a particular stimulus or developmental or growth stage. 
Those skilled in the art will be able to preselect a set of amino 
acid sequences based on that which is known for particular 
sequences as provided in the scientific literature or in 
annotations [anotations] of particular databases. Examples of 
subsets of polypeptides from which subsets can be identified in 
the methods of the invention include, for example, kinases, 
protein [G-Protein] coupled receptors, nuclear factors, 
proteases, dehydrogenases, phosphatases, transcription factors, 
nucleotide binding enzymes or membrane proteins. 

A marked up copy of the paragraph spanning pages 19 and 
20 (page 19, line 15, through page 20, line 8) follows: 



A set of amino acid sequences used in the methods can 
be translated from one or more nucleic acid sequences in a 
nucleic acid sequence database. Accordingly, the methods can 
include a step of translating the coding regions of a nucleic 
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acid sequence into amino acid sequences. A coding region of a 
nucleic acid sequences can be translated according to the 
appropriate genetic code for the organism from which the nucleic 
acid sequence is derived. The coding region can be a 
predetermined portion of the sequence or in the case where exons 
and introns are present a predetermined set of spliced portions 
identified, for example, from annotations of the nucleic acid in 
the database. Alternatively, the coding region can be predicted 
or determined based on methods known in the art for predicting 
gene structure or coding sequence location. Computational 
methods for predicting the coding region of a nucleic .acid 
sequence are known in the art as described in Pevzner, 
Computational Molecular Biology, an Algorithmic Approach . The MIT 
Press, Cambridge MA (2000)^ and include, for example, statistical 
approaches based on codon usage or in- frame hexamer count, 
similarity based approaches, spliced alignment approaches and 
Hidden Markov based approaches such as GENSCAN. 

A marked up copy of the paragraph spanning pages 21 and 
22 (page 21, line 16, through page 22, line 3) follows: 

The dynamic programing algorithm is a mathematically 
rigorous method of pairwise sequence comparison and can be used 
according to several variants including, for example, Needleman- 
Wunsch (Needleman and Wunsch, J. Mol. Biol. 48:443-453 (1970)), 
Sellers (Sellers, J. Appl. Math. 26:787-793 (1974)), quasi-global 
alignment (Sellers Proc. Natl. Acad. Sci. USA 76:3041-3041 
(1979)) and Smith-Waterman (Smith and Waterman, J. Mol. Biol. 
147:195-197 (1981) and Waterman and Eggert, J. Mol. Biol. 
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197:723-728 (1987)). The dynamic programming [programing] 
algorithm is rigorous and therefore [therefor] , well suited for 
finding optimum alignments and sequence comparison scores for a 
set of amino acid sequences. The dynamic programing algorithm, 
being rigorous is also computationally demanding. In 
applications of the methods in which large sequence sets are used 
or less rigorous comparison is required a heuristic search 
algorithm can be used. 

A marked up copy of the second paragraph on page 22 
(lines 4-28) follows: 

Heuristic algorithms that can be used in the methods of 
the invention include, for example, BLAST and FASTA. BLAST, 
Basic Local Alignment Search Tool, uses a heuristic algorithm 
that reduces the computational requirements of the Smith-Waterman 
algorithm by seeking local alignments prior to comparing 
sequences in a restricted version of the Smith-Waterman 
algorithm. BLAST is therefore able to detect relationships among 
sequences including those which share only isolated regions of 
similarity including, for example, protein domains (Altschul et 
al., J. Mol. Biol. 215:403-410 (1990)). BLAST divides sequences 
into a list of overlapping words and extends the list to include 
all words that score above a predefined matrix-defined threshold. 
This threshold limits the number of matches that will be passed 
from the heuristic screening step to the comparison step. Those 
skilled in the art can use BLAST according to a default 
parameters as described by Tatiana et al., FEMS Microbial Lett . 
174:247-250 (1999) or on the National Center for Biotechnology 
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Information web page at ncbi . nlm . nih . aov/RT.AfiT/ 
[ncbi.nlm.gov/BLAST/]. Alternatively, parameters such as the 
length of the words, value of the predefined matrix-defined 
threshold or type of similarity matrix utilized can be adjusted 
to suit a particular application of the methods of the invention. 

A marked up copy of the paragraph spanning pages 23 and 
24 (page 23, line 17, through page 24, line 2) follows: 

FASTA uses a word search algorithm as a heuristic 
screen prior to performing a restricted Smith- Waterman alignment 
(Pearson and Lippman, Proc. Natl. Acad. Sci. USA 85:2444-2448 
(1988)) . In the word search both the query and library sequences 
are divided into overalapping words of specified length. The 
lists of words for the query [Query] and library sequences are 
compared in a matrix and the diagonal with the most matching 
words is taken as the region most likely to contain the best 
alignment. The results from the word search are used to identify 
sequences with sufficient similarity to use in the subsequent 
alignment step. Those skilled in the art can use default 
parameters or adjust parameters such as word size, window size 
for [the] defining the length of insertions or deletions, one 
sequence can accumulate relative to another or the type of 
similarity matrix utilized. 

A marked up copy of the paragraph spanning pages 26 and 
27 (page 26, line 26, through page 27, line 14) follows: 
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The distance between [a] two similarity signatures, 
that are represented as a first and second vector in high 
dimensional space, can be determined based on the distances 
separating the points of the first vector from the points of the 
second vector. A variety of distance measures are known in the 
art can be used in the methods of the invention including, for 
example, Euclidian distance. Euclidian distance is the square 
root of the sum of the difference between each of the elements in 
the two compared vectors, squared. Another distance is the 
Mahalanobis distance, which scales the difference in each 
coordinate by the inverse of the variance in that dimension as 
described, for example, in Mahalanobis, Proc. Natl . Acad, .gni 
USA 12:49-55 (1936) . The cosine of the angle between the two 
vectors can also be computed and used as a distance metric. 
Hamming distance between two vectors is also useful in the 
methods of the invention and it is given by the count of the 
number of elements in which the two vectors differ. 

A marked up copy of the paragraph spanning pages 27 and 
28 (page 27, line 15, through page 28, line 2) follows: 

Distances that are particularly useful when binary 
sequence comparison scores are used include, for example, the 
exclusive OR which is a reduction of a hamming distance to a 
binary case, again being a count of the number of elements 
differing between the two vectors that are compared. The 
Tanimoto coefficient is the ratio of bits set (where a bit set is 
a bit that is equal to l) for both vectors to the total number of 
bits set in either vector. A generalization of the Tanimoto 
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coefficient is the Tversky Similarity, where both vectors can be 
given different weighting as described in Sneath and Sokal, 
Numerical Taxonomy WH Freeman, San Francisco (1973) . Those 
skilled in the art will recognize that this is only a partial 
list of the methods known in the art for measuring distance 
between vectors and will be able to use other known methods for 
measuring distance between vectors in [ion] the methods to 
determine the distance between sequence similarity signatures 
according to the teaching herein. 

A marked up copy of the paragraph spanning pages 33 
through 35 (page 33, line 27, through page 35, line 7) follows: 

Common structural properties can be identified by 
comparing the three dimensional structures of two or more 
polypeptides or a bound ligand using methods known in the art 
including, for example, cluster analysis of structures, visual 
inspection and pairwise structural comparisons. Cluster analysis 
of structures is commonly performed by, but not limited to, 
partitioning methods or hierarchical methods as described, for 
example, in Kauffman and Rousseeuw, Finding Orpups in n^h^ • ar. 
Introduction to Cluster Analy sjo, John Wiley and Sons Inc., New 
York (1990) . Partitioning methods that can be used include, for 
example, partitioning around medoids [mediods] , clustering large 
applications, and fuzzy analysis, as described in Kauffman and 
Rousseeuw, supra. Hierarchical methods useful in the invention 
include, for example, agglomerative nesting, divisive analysis, 
and monothetic analysis, as described in Kauffman and Rousseeuw, 
supra. Algorithms for cluster analysis of molecular structures' 
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are known in the art and include, for example, COMPARE (Chiron 
Corp, 1995; distributed by Quantum Chemistry program Exchange, 
Indianapolis IN) . COMPARE can be used to make all possible 
pairwise comparisons between a set of conformations of 
polypeptides or bound ligands or portions thereof. COMPARE reads 
PDB files and uses a Ferro -Hermanns ORIENT algorithm for a least 
squares root mean square (RMS) fit. The structures can be 
clustered into groups using the Jarvis- Patrick nearest neighbors 
algorithm. Based on the RMS deviation between polypeptide 
structures or bound conformations of a ligand, or portions 
thereof, a list of 'nearest neighbors' for each structure is 
generated. Two structures are then grouped together or clustered 
if: (1) the RMS deviation is sufficiently small and (2) if both 
structures share a determined number of common * neighbors ' . Both 
criteria are adjusted by the program to generate clusters based 
on a user defined cutoff for distance between individual 
clusters. Follow up analysis can be conducted using Insightll to 
verify structural clusters. Thus, two or more polypeptides can 
be confirmed as being in the same cluster or a polypeptide can be 
assigned to one of two or more proximal clusters based on common 
cluster assignment evaluated by both sequence based clustering 
and structure -based clustering. 

A marked up copy of the paragraph spanning pages 35 and 
36 (page 35, line 25, through page 36, line 22) follows: 

Using methods such as those described above, one 
skilled in the art will know how to identify structures that are 
substantially the same. For example, similarity can be evaluated 
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according to the goodness of fit between two or more three- 
dimensional models of a polypeptide or bound ligand, or fragments 
thereof. Goodness of fit can be represented by a variety of 
parameters known in the art including, for example, the root mean 
square deviation (RMSD) . A lower RMSD between structures 
correlates with a better fit compared to a higher RMSD between 
structures (see for example, Doucet and Weber, Computer-Aided 
Molecular Design: Theory and Applications , Academic Press, San 
Diego, CA (1996)). Polypeptides having substantially the same 
structures can be identified by comparing mean RMSD values for 
the backbones of the polypeptides [thepolypeptides] . 
Polypeptides, or fragments thereof, having substantially the same 
structures can have a mean backbone RMSD compared to each other 
that is less than about 5 A or less than about 3 A. Those 
skilled in the art will know that despite a high RMSD between 
overall structures indicating overall structural differences, two 
polypeptides can contain domains or other regions that are 
similar. Thus, a model used in comparing polypeptide structures 
can be that of the backbone structure of a domain or other region 
of the polypeptide. Bound conformations of a ligand having 
substantially the same structures can have a mean RMSD compared 
to each other that is less than about 1 . 1 A, [I.IA] 

A marked up copy of the second paragraph on page 39 
(lines 7-13) follows: 
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The invention can be used with any ligand that binds to 
two or more different polypeptides having [havin] different 
sequences including, for example, chemical or biological 
molecules such as simple or complex organic molecules, metal- 
containing compounds, carbohydrates, peptides, peptidomimetics, 
carbohydrates, lipids, nucleic acids, and the like. 

A marked up copy of the third paragraph on page 39 
(lines 14-26) follows: 

In one embodiment, the methods of the invention can be 
used with a ligand that is a nucleotide derivative including, for 
example, a nicotinamide adenine dinucleotide-related molecule. 
Nicotinamide adenine dinucleotide-related (NAD-related) molecules 
that can be used in the methods of the invention can be selected 
from the group consisting of oxidized nicotinamide adenine 
dinucleotide (NAD*) , reduced nicotinamide adenine dinucleotide 
(NADH) , oxidized nicotinamide adenine dinucleotide phosphate 
(NADP*) , and reduced nicotinamide adenine dinucleotide phosphate 
(NADPH) . An NAD-related molecule can also be a mimetic of the 
above -d escribed [above- described] molecules. 

A marked up copy of the paragraph spanning pages 46 and 
47 (page 46, line 9, through page 47 line 2) follows: 

In yet another representation, the conformer model can 
be a volume surrounding all or a subset of the bound 
conformations of a ligand bound to polypeptides in a cluster. A 
model showing volume can be useful for comparing other structures 
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in a fitting format such that a structure which fits within the 
volume of the model can be identified as substantially similar to 
the model. One approach that can be used to fit a structure to a 
volume is comparison of equivalent surface patches using gnomonic 
projection as described for example in Chau and Dean, J . Mol . 
Graphics 7:130 (1989). Use of a gnomonic projection to compare 
structures is also described in Doucet and Weber, Compu ter-Aided 
Molecular Design: Theory and Applications . Academic Press, San 
Diego CA (1996) . Algorithms which can be used to fit a structure 
to a volume are known in the art and include, for example, 
CATALYST (Molecular Simulations Inc., San Diego, CA) and THREEDOM 
which is a part of the INTERCHEM package which makes use of an 
Icosahedral Matching Algorithm ( Bladon. J. Mol. Graphics 7:130 
(1989)) for the comparison and [Bladon, J, Mol, Graphics 7:130 
(1989) for the comparison and l alignment of structures. Methods 
of identifying a binding compound by searching a database of 
structures using a gnomonic projection are described, for 
example, in U.S. patent application number 09/753,020, which is 
hereby incorporated by reference. 

A marked up copy of the paragraph spanning pages 78 and 
79 (page 78, line 15, through page 79, line 12) follows: 

Sequence comparison signatures were determined for the 
NAD (P) -binding sequences (including 28 DXPR sequences) in the 
Swiss-Prot database [[12]] and clustering was performed as 
described in Examples I and II. The 28 DXPR sequences formed one 
cluster. When visualized in a comparison matrix, the DXPR 
cluster was proximal to other clusters. These other clusters 
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were composed of aspartate semialdehyde dehydrogenase, homoserine 
dehydrogenase, N-acetyl-g-glutamyl phosphate reductoisomerase, or 
glyceraldehyde 3 -phosphate dehydrogenase; all of which share a 
common NAD (P) -binding Rossmann fold. The proximity correlated 
with local sequence identity between DXPR sequences and sequences 
of these other clusters, ranging from about 17 to 40% local 
sequence identity. Although the E-scores of these sequence 
identities were between O.l and 2.0, these clusters were 
identified as related groups because multiple DXPR sequences 
systematically showed cross-talk to only the above mentioned 
sequence clusters. In particular, cross-talk was identified as 
low sequence identity (less than 30%) between the cluster 
containing DXPR and a few sequences belonging to other clusters, 
which showed a pattern that was distinct from a pattern observed 
in the cluster. The cross-talk [cross talk] was distinguishable 
from true noise because in the case of noise, only a single DXPR 
sequence had low similarity to some other cluster. Based on 
these data, the NADP-binding domain of E. coli DXPR was predicted 
to contain a Rossmann fold. 
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